v2 - use list of predictions from stage 1 model

Features to add:

Aggregate feats:

We should create a dictionary of the rank, count, city/country etc' feats, so we can easily merge them when making more "negative" samples/feats for ranking.

Leaky or potentially leaky (Dependso n test set):

DF of features per city

WARNING! Some features here are NOT Calculate d correctly - we see too many values, they are not unique per city_id & hotel!!

add lag features + Train/test/data split

WE may not need to drop these features anymore - but we may want to join them by city id

join with predicted candidates

Model

All the categorical vals must be known from train (demo used label encoder). Consider doing so also here at late step, to avoid unknown vals ?

Ordinal/categoircals encoder

feature importance & evaluation

Catboost Classifciation/Ranking model

Catboost model

train model

Results

200k sampled data, max

1.2k epochs, default settings, 35% of ranked data:

* {'learn': {'Logloss': 0.0585, 'AUC': 0.959}
* 'validation': {'Logloss': 0.07554, 'AUC': 0.9168}}

Feature importance - SHAP